We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to specify the output image using a text prompt. Inspired by the success of those models, and led by the notion that language was already developed to describe the elements of visual contexts that humans find most important, we introduce an embedding model closely related to a vision-language model. Specifically, we introduce the embedding model S-MAGMA: a 13 billion parameter multimodal decoder combining components from an autoregressive vision-language model MAGMA and biases finetuned for semantic search.
translated by 谷歌翻译
Deep neural networks have emerged as the workhorse for a large section of robotics and control applications, especially as models for dynamical systems. Such data-driven models are in turn used for designing and verifying autonomous systems. This is particularly useful in modeling medical systems where data can be leveraged to individualize treatment. In safety-critical applications, it is important that the data-driven model is conformant to established knowledge from the natural sciences. Such knowledge is often available or can often be distilled into a (possibly black-box) model $M$. For instance, the unicycle model for an F1 racing car. In this light, we consider the following problem - given a model $M$ and state transition dataset, we wish to best approximate the system model while being bounded distance away from $M$. We propose a method to guarantee this conformance. Our first step is to distill the dataset into few representative samples called memories, using the idea of a growing neural gas. Next, using these memories we partition the state space into disjoint subsets and compute bounds that should be respected by the neural network, when the input is drawn from a particular subset. This serves as a symbolic wrapper for guaranteed conformance. We argue theoretically that this only leads to bounded increase in approximation error; which can be controlled by increasing the number of memories. We experimentally show that on three case studies (Car Model, Drones, and Artificial Pancreas), our constrained neurosymbolic models conform to specified $M$ models (each encoding various constraints) with order-of-magnitude improvements compared to the augmented Lagrangian and vanilla training methods.
translated by 谷歌翻译
“感应头”是注意力头,它实现了一种简单的算法来完成令牌序列,例如[a] [b] ... [a] - > [b]。在这项工作中,我们提供了一个假设的初步和间接证据,即诱导头可能构成大型大型变压器模型中所有“文本学习”中大多数的机制(即减少在增加代币指数时损失的损失)。我们发现,诱导头在与秘密学习能力突然急剧上的急剧上升的位置完全相同,这是训练损失的颠簸。我们提出了六种互补的证据,认为诱导头可能是任何大小的变压器模型中一般性内部学习的机理来源。对于仅关注的小型模型,我们提供了有力的因果证据。对于具有MLP的较大模型,我们提供相关证据。
translated by 谷歌翻译
随着摩尔的定律饱和和丹纳德的缩放率撞到了墙壁,传统的冯·诺伊曼系统无法为CNN等计算密集型算法提供GFLOPS/WATT。非常规计算方法的最新趋势使我们希望为此类算法设计高能节能的计算系统。神经形态计算是一种有希望的方法,其脑启发的电路,新兴技术的使用和低功率性质。研究人员使用各种新型技术,例如回忆录,硅光子学,鳍片和碳纳米管来演示神经形态计算机。但是,从神经形态的逻辑设计开始并进行建筑模拟的灵活CAD工具尚未得到证明,以支持这种有希望的范式的兴起。在这个项目中,我们旨在构建Neucasl,这是一个基于OpenSource Python的完整系统CAD框架,用于神经形态逻辑设计,电路模拟以及系统性能和可靠性估计。据我们所知,这是同类产品中的第一个。
translated by 谷歌翻译
研究深度学习的鲁棒性的一个主要挑战是定义了给定神经网络(NN)不变的``毫无意义''扰动集。关于鲁棒性的大多数工作隐含地将人作为参考模型来定义这种扰动。我们的工作通过使用另一个参考NN来定义给定的NN应该不变,从而使对任何NN的依赖概述对任何NN的依赖。这使得衡量鲁棒性等同于衡量两个NN共享不稳定的程度,我们提出了一种称为搅拌的措施。搅拌重新调整现有的表示相似性措施,使其适合衡量共享的不稳定。使用我们的度量,我们能够深入了解共享的不断增长,随着重量初始化,体系结构,损失功能和培训数据集的变化如何变化。我们的实现可在:\ url {https://github.com/nvedant07/stir}中获得。
translated by 谷歌翻译
多年来,为各种对象检测任务开发了数据集。海事域中的对象检测对于船舶的安全和导航至关重要。但是,在海事域中,仍然缺乏公开可用的大规模数据集。为了克服这一挑战,我们提出了Kolomverse,这是一个开放的大型图像数据集,可在Kriso(韩国研究所和海洋工程研究所)的海事域中进行物体检测。我们收集了从韩国21个领土水域捕获的5,845小时的视频数据。通过精心设计的数据质量评估过程,我们从视频数据中收集了大约2,151,470 4K分辨率的图像。该数据集考虑了各种环境:天气,时间,照明,遮挡,观点,背景,风速和可见性。 Kolomverse由五个类(船,浮标,渔网浮标,灯塔和风电场)组成,用于海上对象检测。该数据集的图像为3840美元$ \ times $ 2160像素,据我们所知,它是迄今为止最大的公开数据集,用于海上域中的对象检测。我们进行了对象检测实验,并在几个预训练的最先进的架构上评估了我们的数据集,以显示我们数据集的有效性和实用性。该数据集可在:\ url {https://github.com/maritimedataset/kolomverse}中获得。
translated by 谷歌翻译
对抗性训练(AT)及其变体在过去几年来改善对对抗性扰动和常见腐败的神经网络的鲁棒性方面取得了长足的进步。 AT及其变体的算法设计集中在指定的扰动强度$ \ epsilon $上,并且仅利用该$ \ epsilon $ -Robust模型的性能的反馈来改善算法。在这项工作中,我们专注于在$ \ epsilon $值的频谱上训练的模型。我们分析了三个观点:模型性能,中间特征精度和卷积滤波器灵敏度。在每种情况下,我们都会确定AT的替代改进,否则在单个$ \ epsilon $中并不明显。具体来说,我们发现,对于以某种强度$ \ delta $的pgd攻击,有一个型号以某种稍大的强度$ \ epsilon $,但没有更大的范围,可以概括它。因此,我们建议过度设计鲁棒性,我们建议以$ \ epsilon $略高于$ \ delta $的培训模型。其次,我们观察到(在各种$ \ epsilon $值中),鲁棒性对中间特征的精度,尤其是在第一层和第二层之后的精度高度敏感。因此,我们建议在防御措施中添加简单的量化,以提高可见和看不见的适应性攻击的准确性。第三,我们分析了增加$ \ epsilon $的每一层模型的卷积过滤器,并注意到第一和第二层的卷积过滤器可能完全负责放大输入扰动。我们通过在CIFAR-10和CIFAR-10-C数据集上使用Resnet和WideSnet模型进行实验,介绍我们的发现并证明我们的技术。
translated by 谷歌翻译
机器学习模型的性能会在数据的分布变化下大大降低。我们提出了一种新的分类方法,可以通过将有关数据的“高级”结构与标准分类器相结合,可以改善分配变化的鲁棒性。 。然后,在每个群集中,我们通过诸如Deep Neural Networks之类的标准模型来学习基于更精细的歧视特征的本地分类器。我们在内存分类器中建立了概括界限。我们在实验中说明它们可以改善图像数据集上的分布变化的概括和稳健性。我们展示的进步超出了标准数据增强技术。
translated by 谷歌翻译
长期的Horizo​​n机器人学习任务稀疏的奖励对当前的强化学习算法构成了重大挑战。使人类能够学习挑战的控制任务的关键功能是,他们经常获得专家干预,使他们能够在掌握低级控制动作之前了解任务的高级结构。我们为利用专家干预来解决长马增强学习任务的框架。我们考虑\ emph {选项模板},这是编码可以使用强化学习训练的潜在选项的规格。我们将专家干预提出,因为允许代理商在学习实施之前执行选项模板。这使他们能够使用选项,然后才能为学习成本昂贵的资源学习。我们在三个具有挑战性的强化学习问题上评估了我们的方法,这表明它的表现要优于最先进的方法。训练有素的代理商和我们的代码视频可以在以下网址找到:https://sites.google.com/view/stickymittens
translated by 谷歌翻译
Online bipartite-matching platforms are ubiquitous and find applications in important areas such as crowdsourcing and ridesharing. In the most general form, the platform consists of three entities: two sides to be matched and a platform operator that decides the matching. The design of algorithms for such platforms has traditionally focused on the operator's (expected) profit. Since fairness has become an important consideration that was ignored in the existing algorithms a collection of online matching algorithms have been developed that give a fair treatment guarantee for one side of the market at the expense of a drop in the operator's profit. In this paper, we generalize the existing work to offer fair treatment guarantees to both sides of the market simultaneously, at a calculated worst case drop to operator profit. We consider group and individual Rawlsian fairness criteria. Moreover, our algorithms have theoretical guarantees and have adjustable parameters that can be tuned as desired to balance the trade-off between the utilities of the three sides. We also derive hardness results that give clear upper bounds over the performance of any algorithm.
translated by 谷歌翻译